TimeZero is a reasoning-guided large-scale vision-language model (LVLM) specifically designed for temporal video grounding (TVG) tasks, achieving dynamic video-language relationship analysis through reinforcement learning methods.
Video-to-Text
Transformers